A study of error potentials in P300 spelling

Final project for the "Exploring Neural Data" course on Coursera

Dataset description

For this project we decided to analyze the BCI Challenge @ NER 2015 dataset available on Kaggle. This dataset includes EEG recordings from 16 subjects (7 men, mean age = 28.2 $\pm$ 5.1 (SD), range 20-37) performing a P300 spelling task.

In the P300 spelling task the subject is presented with a matrix of $6 \times 6$ characters on the screen. The cells of the matrix are flashed and the subject focuses on the target character. Flashing of the cell, while the subject is attending to it, elicits a wave in EEG called the P300 event-related potential (ERP). Using machine learning techniques it's possible to detect this wave, thus detecting the character subject wants to spell. Therefore, P300 spelling is one of brain-computer interfacing (BCI) paradigms.

Unfortunately, a BCI cannot typically detect the P300 wave with perfect accuracy. One way to remedy this is to provide feedback to subject (which character was detected) and measure the EEG response to feedback. The correct and incorrect feedback elicit different responses - the latter evokes an Error-Related Potential (ErrP). This response has two distinct components - error-related negativity (ERN or Neg-ErrP) and error-related positivity (Pe or Pos-ErrP).

Detecting ErrPs can help the accuracy of the P300 speller - when an error is detected, the character can be automatically substituted with the second best guess, or the subject can try to select it again.


In the dataset we analyzed the subjects spelled 60 characters in each of the four sessions, and 100 characters in the last fifth session. After the selection of a character feedback was provided. Since the target characters were predetermined, each feedback can be labeled as correct or incorrect. In total there were 5440 feedbacks - 340 per subject. Below we provide a random excerpt from the table describing each feedback trial (where "Prediction" is 1, the feedback was correct).

Index Subject Session FeedbackNo Prediction
1968 13 5 29 0
1297 11 5 38 0
822 7 3 23 1
768 7 2 29 1
2867 17 3 28 0

During the task EEG was recorded with 56 passive Ag/AgCl sensors positioned according to the 10/20 system. Signals were originally sampled at 600 Hz, but the available dataset was downsampled to 200 Hz. Besides EEG, eye activity was also recorded with an EOG channel. Below we show a part of the recording, ommiting all the channels except Cz and EOG. Note: FeedbackEvent marks when the feedback was presented to the subject (the value changes to 1 at the moment of presentation).

Time Cz EOG FeedbackEvent
0 0.000 311.298295 -906.668876 0
1 0.005 551.888548 -1484.107119 0
2 0.010 478.480250 -1313.435186 0
3 0.015 502.729161 -1391.966973 0
4 0.020 479.678270 -1347.494166 0

Further details about the experiment can be found in the paper of Perrin et al., 2012.

Hypotheses

Ideally, we could just put an EEG headset on and use a BCI like the described P300 speller. But for now BCIs are not able to do this - they need to be calibrated to a particular user. A common problem with BCIs is that once they are calibrated and put into operation, their performance degrades over time and they need to be recalibrated. This is inconvenient and reduces the usefulness of BCIs.

One assumed source of performance degradation is the nonstationary nature of EEG. The properties of the recorded signals change over time due to a number of factors: artifacts (sweating, muscle tension), changes in skin-electrode contact, fatigue, changes in attention, etc. We were curious if we could find this kind of temporal nonstationarity in the amplitudes of error potentials over a course of an experiment. Finding such nonstationarity would indicate the need of adapting a BCI even within a single experiment.

Our hypothesis was therefore: the amplitudes of the error-related negativity (Neg-ErrP) and positivity (Pos-ErrP) are going to diminish over the course of an experiment.

Methods

In our analysis we focused on a single channel - Cz, which is located on the top of the head. This channel was previously found to exhibit good discrimination between correct and incorrect feedbacks (Perrin et al., 2012).

In order to reduce various sources of noise (electrical and biological - e.g. power line noise, muscle noise, skin potentials, ...) we first filtered the signal. We used a digital bandpass IIR Butterworth filter of the 4th order. The half-power (or -3dB) cutoff points were 0.1 Hz and 30 Hz.

To test the operation of the filter we generated a signal mimicking an ErrP (a period of a sinusoid). We added white gaussian noise to this signal and an additional 50 Hz sinusoid mimicking the power line noise. All the applications of the filter were zero-phased.

First we show the clean test signal:

Next figure shows the test signal corrupted by noise and the signal after filtering:

The difference between the original signal, and the signal corrupted by noise and then filtered is seen on the next figure:

Trick: instead of using scipy.signal.lfilter() function to filter, we used scipy.signal.filtfilt(). This function filters the signal twice, once from the left side, and once from the right. This removes the delay that filter would usually introduce into the signal.


After filtering there are still many artifacts related to blinks and eye movements in the channel of interest. We can see this clearly in the following figure which shows an eye blink appearing both in the EOG and Cz channel:

To remove these artifacts linear regression can be used. We assumed the following linear model:

$\mathrm{Cz} = \beta \times \mathrm{EOG} + \mathrm{neural~activity}$.

Then activity of interest can be obtained with:

$\mathrm{neural~activity} = \mathrm{Cz} - \hat{\beta} \times \mathrm{EOG}$,

where $\hat{\beta}$ is the estimated regression coefficient. To estimate the regression coefficient we used the "statsmodels" package.


After cleaning the data, it's necessary to split it into trials. We took 200 ms before each feedback event and 800 ms after it, giving us 1 second trials or epochs. The baseline period (before the feedback) needs to be zero mean to facilitate averaging over trials. To achieve this we subtracted the mean value of the baseline period from the whole trial (individually for each trial).


Technical note

A technical problem we faced was efficient and convenient data representation and manipulation. The description of each trial was stored as a row in a Pandas dataframe. We wanted to add the EEG data organized in a 2D NumPy array as an additional cell in the row. This turned out to be an inconvenient solution. Once stored in the dataframe, the NumPy array is converted into a string representation. We found no function to easily convert back from the string representation into an array. Thus, we separated the EEG data into a separate dictionary, where the keys corresponded to the row numbers of the table, and the values were NumPy arrays. In this way we could link between a trial description in the dataframe and the recorded data in the dictionary. To save the results, we used CSV files for the dataframe and binary files for the EEG data via the Python "pickle" module. This change also significantly sped up the execution of the code.

Results

Before presenting the results, we show the sample sizes for each subject and session. We were more interested in the incorrect feedbacks which presumably elicit ErrPs, thus we show the number of incorrect trials.

Subject 2 6 7 11 12 13 14 16 17 18 20 21 22 23 24 26 All
Session
1 10 1 4 14 19 22 23 16 6 9 17 2 3 18 6 25 195
2 17 3 7 18 19 30 16 19 17 14 14 2 4 24 9 23 236
3 24 5 3 18 27 35 18 22 24 14 18 4 6 21 14 30 283
4 25 4 7 21 27 25 18 25 26 15 19 5 5 17 23 21 283
5 44 11 12 44 59 50 50 47 41 27 36 15 9 39 49 60 593
All 120 24 33 115 151 162 125 129 114 79 104 28 27 119 101 159 1590

ERP averages

The following figure shows the average of all the erroneous and correct trials, and their difference. The average has been computed over all the sessions and subjects.

Both effects - error-related negativity and positivity - can clearly be seen in the averages, especially in the difference wave. The magenta and cyan lines outline the intervals of interest for the Neg-ErrP and Pos-ErrP, respectively.

 Effect of time on ErrP amplitudes

After selecting the intervals of interest, we calculated the mean amplitude within these intervals for each trial. In the following figure we show the amplitude of Neg-ErrP over the course of the experiment, aggregated for all the subjects.

As can be seen from the distribution of amplitudes, and from the fitted regression line, there is no significant relation between the elapsed time and the Neg-ErrP amplitude.

For the Pos-ErrP our results suggest a very weak relation ($r = -0.05$) that was not statistically significant. This can be seen in the following figure.

Future work

On one hand, we found no correlations between the time and the amplitude of error potentials (Neg-ErrP). On the other hand, we observed an increased number of erroneous feedbacks with time, which could be a result of changes in the P300 wave. We also found that the amplitude of the potentials elicited by the positive feedback (Pos-ErrP) showed a slight decreasing trend over time, which could be explained by fatigue and loss of focus on the task with time.

It is known that BCIs are prone to errors in the recognition of subject's intent to perform a specific task. We consider that further investigation of the spatio-temporal evolution of ERPs over time could give interesting insights. More specifically, analyzing only one channel could have had a restrictive effect on our findings. Furthermore, we believe that a longer recording session would be more appropriate for the assessment of this time-amplitude correlation. In addition, it would be interesting to explore this correlation over multiple sessions recorded in different days. According to this protocol, we could disentangle the effects of fatigue (due to a longer experimental session) from the ones of task-related training (due to multiple training sessions).

Another limitation of our analysis was aggregating all the subject's data together. This might have hidden some subject-specific patterns. Attention and fatigue have opposite effects from training, so this might have cancelled out over different subjects, or even within subjects.

We intend to continue working on this dataset and use our results to participate in the Kaggle challenge of designing a BCI for detecting errors which would be subject independent and robust to changes in EEG properties over time.

References

Chavarriaga, Ricardo, Aleksander Sobolewski, and José del R. Millán. "Errare machinale est: the use of error-related potentials in brain-machine interfaces." Frontiers in neuroscience 8 (2014).

Margaux, Perrin, et al. "Objective and subjective evaluation of online error correction during P300-based spelling." Advances in Human-Computer Interaction 2012 (2012).